05-viz

Author

Professor Shannon Ellis

Published

January 26, 2023

Effective Data Visualization

Slides modified from datascienceinabox.org

Q&A

Q: What were those columns in the NC Bike Accident Data we used?
A: Variables are described here

Q: I was confused by wrap vs grid and how should I choose between them.
A: When you want to generate a plot that uses two other variables in the dataset to determine which subset of the data to plot, grid! When you want to use a single variable to facet your data and want to specify how many columns/rows to display, wrap!

Q: When programming in R so far, I often find myself stuck at getting going on a problem and have a different time identifying on where to start. Any tips/advice on how to get past this initial bump in order to start getting through the problem?
A: This is a common struggle! This may sound like an old-person response, but jotting down what you have and what you want (like on actual paper/iPad) can be really helpful. For example, if you have 3 columns and you know you want to have 3 columns at the end, but you want fewer rows, you can draw a picture of this and help yourself realize you need a filter. Of course when there are multiple steps, the drawings become a bit more complex…but also more helpful! The same can be said for data visualization. Drawing out quickly what you want can help you get started.

Q:In what time frame will the lecture survey be available, how many hours after class will the survey be closed
A: It will be open for at least 2h.

Q:For HW1 Q7, I used read_csv and got an error message. I tried read.csv and it worked. Is there any difference between read_csv and read.csv?
A: Hmm…I’d love to take a look to see what error you got. They are similar and often behave the same way. The difference is read.csv() was made before the tidyverse, so it reads your data in as a dataframe. read_csv() is a function that “plays nicely” with the tidyverse and reads the data in as a tibble/data frame. What does that mean practically? It means that typically each one will read the data in and you’ll get the same number of rows and columns. What could differ would be the column names and/or the column types (depending upon the data). All that said, read_csv() is what I’ll recommend in this course…so that’s why I’m curious about the error you got!

Course Announcements

Due Dates:

  • Lab 03 due Friday (1/27; 11:59 PM)
  • Lecture Participation survey “due” after class

Course Announcements:

  • Lab02 Grades (Canvas) & Feedback (GitHub Issue) Posted
  • HW02 Now Available
  • Discord? - Campuswire post
  • “Vote” on posts when grades released (pink: send message; green: announce in class)

Keep it simple

Use color to draw attention

Tell a story

Credit: Angela Zoss and Eric Monson, Duke DVS

Principles for effective visualizations

Principles for effective visualizations

  • Order matters
  • Put long categories on the y-axis
  • Keep scales consistent
  • Select meaningful colors
  • Use meaningful and nonredundant labels

Data

In September 2019, YouGov survey asked 1,639 GB adults the following question:

In hindsight, do you think Britain was right/wrong to vote to leave EU?

  • Right to leave
  • Wrong to leave
  • Don’t know

Source: YouGov Survey Results, retrieved Oct 7, 2019

The Data: Code

brexit <- tibble(
  opinion = c(
    rep("Right", 664), rep("Wrong", 787), rep("Don't know", 188)
  ),
  region = c(
    rep("london", 63), rep("rest_of_south", 241), rep("midlands_wales", 145), rep("north", 176), rep("scot", 39),
    rep("london", 110), rep("rest_of_south", 257), rep("midlands_wales", 152), rep("north", 176), rep("scot", 92),
    rep("london", 24), rep("rest_of_south", 49), rep("midlands_wales", 57), rep("north", 48), rep("scot", 10)
  )
)

Order matters

Alphabetical is rarely ideal

ggplot(brexit, aes(x = opinion)) +
  geom_bar()

Order by frequency

fct_infreq: Reorder factors’ levels by frequency

ggplot(brexit, aes(x = fct_infreq(opinion))) + 
  geom_bar()

Clean up labels

ggplot(brexit, aes(x = opinion)) +
  geom_bar() +
  labs( 
    x = "Opinion", 
    y = "Count" 
  ) 

Avoiding Alphabetical Order

ggplot(brexit, aes(x = region)) +
  geom_bar()

Use inherent level order

fct_relevel: Reorder factor levels using a custom order

brexit <- brexit |>
  mutate(
    region = fct_relevel( 
      region,
      "london", "rest_of_south", "midlands_wales", "north", "scot"
    )
  )


Clean up labels

fct_recode: Change factor levels by hand

brexit <- brexit |>
  mutate(
    region = fct_recode( 
      region,
      London = "london",
      `Rest of South` = "rest_of_south",
      `Midlands / Wales` = "midlands_wales",
      North = "north",
      Scotland = "scot"
    )
  )

Put long categories on the y-axis

Long categories can be hard to read

Move them to the y-axis

ggplot(brexit, aes(y = region)) + 
  geom_bar()

And reverse the order of levels

fct_rev: Reverse order of factor levels

ggplot(brexit, aes(y = fct_rev(region))) + 
  geom_bar()

Clean up labels

ggplot(brexit, aes(y = fct_rev(region))) +
  geom_bar() +
  labs( 
    x = "Count", 
    y = "Region" 
  ) 

Pick a purpose

Segmented bar plots can be hard to read

ggplot(brexit, aes(y = region, fill = opinion)) + 
  geom_bar()

Use facets

ggplot(brexit, aes(y = opinion, fill = region)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1) 

Avoid redundancy?

Redundancy can help tell a story

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1)

Be selective with redundancy

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1) +
  guides(fill = "none") 

Use informative labels

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1) +
  guides(fill = "none") +
  labs(
    title = "Was Britain right/wrong to vote to leave EU?", 
    x = NULL, y = NULL
  )

A bit more info

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1) +
  guides(fill = "none") +
  labs(
    title = "Was Britain right/wrong to vote to leave EU?",
    subtitle = "YouGov Survey Results, 2-3 September 2019", 
    caption = "Source: https://d25d2506sfb94s.cloudfront.net/cumulus_uploads/document/x0msmggx08/YouGov%20-%20Brexit%20and%202019%20election.pdf", 
    x = NULL, y = NULL
  )

Let’s do better

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1) +
  guides(fill = "none") +
  labs(
    title = "Was Britain right/wrong to vote to leave EU?",
    subtitle = "YouGov Survey Results, 2-3 September 2019",
    caption = "Source: bit.ly/2lCJZVg", 
    x = NULL, y = NULL
  )

Fix up facet labels

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region,
    nrow = 1,
    labeller = label_wrap_gen(width = 12) 
  ) + 
  guides(fill = "none") +
  labs(
    title = "Was Britain right/wrong to vote to leave EU?",
    subtitle = "YouGov Survey Results, 2-3 September 2019",
    caption = "Source: bit.ly/2lCJZVg",
    x = NULL, y = NULL
  )

Select meaningful colors

Rainbow colors not always best

Manually choose colors when needed

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
  guides(fill = "none") +
  labs(title = "Was Britain right/wrong to vote to leave EU?",
       subtitle = "YouGov Survey Results, 2-3 September 2019",
       caption = "Source: bit.ly/2lCJZVg",
       x = NULL, y = NULL) +
  scale_fill_manual(values = c( 
    "Wrong" = "red", 
    "Right" = "green", 
    "Don't know" = "gray" 
  )) 

Choosing better colors

colorbrewer2.org

Use better colors

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
  guides(fill = "none") +
  labs(title = "Was Britain right/wrong to vote to leave EU?",
       subtitle = "YouGov Survey Results, 2-3 September 2019",
       caption = "Source: bit.ly/2lCJZVg",
       x = NULL, y = NULL) +
  scale_fill_manual(values = c(
    "Wrong" = "#ef8a62", 
    "Right" = "#67a9cf", 
    "Don't know" = "gray" 
  ))

Select theme

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
  guides(fill = "none") +
  labs(title = "Was Britain right/wrong to vote to leave EU?",
       subtitle = "YouGov Survey Results, 2-3 September 2019",
       caption = "Source: bit.ly/2lCJZVg",
       x = NULL, y = NULL) +
  scale_fill_manual(values = c("Wrong" = "#ef8a62",
                               "Right" = "#67a9cf",
                               "Don't know" = "gray")) +
  theme_minimal() 

ggthemes described here

Customize theme

ggplot(brexit, aes(y = opinion, fill = opinion)) +
  geom_bar() +
  facet_wrap(~region, nrow = 1, labeller = label_wrap_gen(width = 12)) +
  guides(fill = "none") +
  labs(title = "Was Britain right/wrong to vote to leave EU?",
       subtitle = "YouGov Survey Results, 2-3 September 2019",
       caption = "Source: bit.ly/2lCJZVg",
       x = NULL, y = NULL) +
  scale_fill_manual(values = c("Wrong" = "#ef8a62",
                               "Right" = "#67a9cf",
                               "Don't know" = "gray")) +
  theme_minimal(base_size = 16) + 
  theme(plot.title.position = "plot", 
        panel.grid.major.y = element_blank()) 

Your Turn

  • Read in the data (Data slide)
  • Think of at least three different ways to tell slightly different stories with these data
  • Try to implement at least one of these ideas!

Recap

  • Can you determine what needs to be done to improve the effectiveness of your visualizations?
  • Can you execute said improvements using ggplot2?
  • Can you tell a story with data?

Suggested Reading